Creating Language Resources for Nlp in Indian Languages

نویسندگان

Rajeev Sangal

Dipti Misra Sharma

چکیده

Non-availability of lexical resources in the electronic form is a major bottleneck for anyone working in the field of NLP on Indian languages. Some measures were taken to alleviate this bottleneck in a quick and efficient way. It was felt that if the development of these resources is linked with an example application then it can act as a test bed for the developing resources and provide constant feedback. Moreover, immediate results in terms of a performing system also enthuses the developers for such time consuming jobs. It was decided to take up the building of a machine translation system as an example application, which would also serve as a vehicle for building lexical resources.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Invited Talk: Breaking the Zipfian Barrier of NLP

We know that the distribution of most of the linguistic entities (e.g. phones, words, grammar rules) follow a power law or the Zipf's law. This makes NLP hard. Interestingly, the distribution of speakers over the world, content over the web and linguistic resources available across languages also follow power law. However, the correlation between the distribution of number of speakers to that o...

متن کامل

Tagging Ingush - Language Technology For Low-Resource Languages Using Resources From Linguistic Field Work

This paper presents on-going work on creating NLP tools for under-resourced languages from very sparse training data coming from linguistic field work. In this work, we focus on Ingush, a Nakh-Daghestanian language spoken by about 300,000 people in the Russian republics Ingushetia and Chechnya. We present work on morphosyntactic taggers trained on transcribed and linguistically analyzed recordi...

متن کامل

Śata-Anuvādak : Tackling Multiway Translation of Indian Languages

We present a compendium of 110 Statistical Machine Translation systems built from parallel corpora of 11 Indian languages belonging to the Indo-Aryan and Dravidian families. We analyze the relationship between translation accuracy and the language families involved. We feel that insights obtained from this analysis will provide guidelines for creating machine translation systems for specific In...

متن کامل

An OLAC Extension for Dravidian Languages

OLAC was founded in 2000 for creating online databases of language resources. This paper intends to review the bottom-up distributed character of the project and proposes an extension of the architecture for Dravidian languages. An ontological structure is considered for effective natural language processing (NLP) and its advantages over statistical methods are reviewed

متن کامل